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Abstract. Consider that the coordinates of N points are randomly generated along 
the edges of a d-dimensional hypercube (random point problem). The probability 
that an arbitrary point is the mth nearest neighbor to its own nth nearest neighbor 
(Cox probabilities) plays an important role in spatial statistics. Also, it has been 
useful in the description of physical processes in disordered media. Here we propose a 
simpler derivation of Cox probabilities, where we stress the role played by the system 
dimensionality d. In the limit d oo, the distances between pair of points become 
indenpendent (random link model) and closed analytical forms for the neighborhood 
probabilities are obtained both for the thermodynamic limit and finite-size system. 
Breaking the distance symmetry constraint drives us to the random map model, for 
which the Cox probabilities are obtained for two cases: whether a point is its own 
nearest neighbor or not. 
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1. Introduction 

Consider N points independent and uniformly distributed along the edges of a d- 
dimensional hypercube. The determination of the distance and neighborhood statistics 
between any pair of points is known as the random point problem (RPP). This is a 
standard approach to construct disordered (random) media. 

Due to boundary effects and triangular restrictions, the distances between any 
pair of points are not all independent random variables. For fixed in the RPP, as 
the system dimensionality d increases, the boundary effects become more and more 
pronounced and the distances between pair of points become less and less correlated. 
One can minimize boundary effects considering periodic boundary condition, and in the 
limit d — i> oo all the two-point distances are independent and identically distributed 
(i.i.d.) random variables. This is the random link (distance) model (RLM) P, which is 
a mean field description of the RPP. 

In the RLM, there exist two Euclidean constraints: (i) the distance from a point 
to itself is always null {Da = 0, for all i) and (ii) the forward and backward distances 
are equal [Dij = Dji, for all i, j). The second constraint imposes serious numerical 
difficulties and an efficient numerical implementation for the RLM is given in Ref . j2] . If 
the distance symmetry constraint is broken, the model becomes the random map model 
(RMM) ini El- In this latter model, a point can be whether its own nearest neighbor 
(Da = 0) or not {Da ^ 0). The latter is the mean field approximation for Kauffman 
automata [H]. 

Both, the RPP and RLM have been very fruitful in the determination of numerical 
and analytical results in several interesting systems. Applications range from statistics 
on the optimal trajectories in the context of traveling salesman problem on a random 
set of cities [HI 13 IHl El UH]; passing by frustrated dimerization optimization modeled 
by the minimum matching problem E] (or equivalently spin-glasses JI]), and 
going to partial self-avoiding deterministic tourist walk [13 El US EE] and its random 
version Partial self-avoiding walks have been our main motivation to address 

the RPP and its mean field models. Although the distance distribution as a function 
of the dimensionality d plays an important role in the random tourist version, in the 
deterministic case one is mainly interested on the neighborhood ranking of random 
points. 

As pointed above, boundary effects are important as the dimensionality of the 
system increases. The points get closer to the surface and to capture the bulk effect, 
one must increase A^. In certain systems it may be difficult to have such large A^ values 
and it would be suitable to have analytical expressions for finite A^, for instance, to test 
reliability of numerical codes or to develop new statistical tests. 

Here we focus on the distribution of neighborhood ranks. The probability that 
an arbitrary point is the mth nearest neighbor of its own nth nearest neighbor in 
the RPP has attracted attention of researchers since the seminal studies of Clark and 
Evans JH] and Clark [201 ■ They devised the term reflexive neighbors for the case m = n 
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and their calculated reflexive neighborhood probability ranking has been corrected by 
Dacey (m > 1) and then generalized (for m ^ n) hj Cox [21], which we call the Cox 

probabilities. 

In this paper, in Sec. |21 we obtain the Cox probabilities using only Poisson 
distribution instead of the various distinct distributions used in the original paper [21j . 
As in Cox calculation, we write the probabilities in the thermodynamic limit N oo. 
Unlike Cox, we write them in terms of known functions (rather than in terms of an 
integral). In Sec. El the use of known special functions allows us to take the high 
dimensionality limit, which leads to the RLM neighborhood probability. Using the same 
arguments to obtain Cox probabilities, we are abke to obtain neighborhood probability 
for finite-size RLM systems. Finally, in Sec|3]we explicitly write the Cox probabilities 
for the two considered case of the RMM. After the concluding remarks ((HI), the discrete 
probability distributions used are briefly reviewed to set up notation in the Appendix. 
All analytical results have been compared and validated by numerical Monte Carlo 
simulations. 

2. Alternative Derivation of Cox Probabilities 

This alternative derivation of Cox formula is simpler than the original paper, since 
it uses only the Poisson distribution, rather then the Poisson, binomial and gamma 
distributions as in the original paper. 

In a (i-dimensional Poissonic medium with an mean density of points per 
unitary volume, the probability that a volume Vd (with an arbitrary shape, even with 
disconnected parts) contains k points is given by the Poisson distribution pois(/c) = 
fi^e~'^/k\, where = 0, 1, 2, . . . , oo and ^ = {k) = XdVd is the expected number of points 
inside the volume Vd- Notice that the thermodynamic limit is taken letting k freely vary 
and that the only parameter of this distribution is fi (the medium dimensionality d is 
not a relevant quantity). 

Let / and J be two points of a (i-dimensional Poissonic medium separated by a 
distance r. The volume Vd{r) of the hypersphere of radius r centered in / (thus, which 
pass through J) isVd{r) = 7r'^/^r'^/T{d/2 + l) = Tr^'^~^'^/^r'^B[l/2,{d+l)/2]/T[{d + l)/2], 
where T{z) = dt t^~^e~* = {z — 1)! is the gamma function [22] and B{a,b) = 
B{b, a) = /o dtt^-^{l-ty-^ = r(a)r(6)/r(a + b) is the beta function |22I. While the 
former generalizes the factorial the latter is a generalization of the inverse of Newton 
binomial. Obviously the volume of the hypersphere centered in J passing through I is 
also Vd{r). Fig. lUshows the case d = 2. 

The volume Vn^di'f') of the intersection of these two hyperspheres is Vfi,d(r) = 
^^id-i)/2^d^Y[{d + l)/2]} J^/^dt _ t)('i-i)/2. The relative volume of a crescent 

(compared to one hypersphere) is pd = [V^(r) — Vrf^d{f)]/Vd{r) = Jq^'^ dt t~'^^'^{l — 
t)(d-m/B[l/2,{d+l)/2]or. 




(1) 
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Figure 1. Two-dimensional poissonic process. The circles centered in the points / 
and J have surface V2 — 7rr^ and the intersection has an area Vn,2 — V2(l — P2) = 
(27r/3 — ■\/3/2)r^. There are i points in the intersection of the V2 surfaces and in the 
/ and J crescents there are n — 1 — i and m — 1 — i points, respectively. 

where I^{a, b) = dt -t)*"V5(a, b) with Re(a) > 0, Re(6) > is the normalized 

incomplete beta function [22]. Notice that pa depends exclusively on the dimensionality 
d and does not depend on the hypersphere radius r. 

It is interesting to mention that pa plays an important role in the parametrization 
of the deterministic tourist walk problem ITT)] . It can be generalized to an 
arbitrary distance Dj j = rx between the points / and J, with x ranging from 
to 2 (from concentric hyperspheres to disjoined ones). In this case, one has pd{x) = 
-^(x/2)2[l/27 (d + l)/2], with = Pd, which has allowed us to tackle analytically the 

random tourist walk problems |T3 QBj ■ 

The following conditions must hold for / be the mth nearest neighbor of J and J 
be the nth nearest neighbor of /: 

(i) there must exist i points inside the intersection of the hyperspheres, with i ranging 
from to min(m — l,n — 1), the expected number of points is fi{l — pd), 

(ii) there must exist m — 1 — i points inside the crescent of J, the expected number of 
points is fipd, 

(iii) there must exist n — 1 — i points inside the crescent of /, the expected number of 
points is also fipd, 

(iv) the distance r between / and J may assume any value in the interval [0, 00) , allowing 
the volume Ad{r) and expected number of points fi = XA inside it also vary from 
to cxo (continuous value). 
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Taking these conditions altogether, one obtains the following expression for the 
probability P^^ = P^^l 

- min(«^l,n-l) ^^^^ _ p^)fe-^^a-P^) 



Jo ~i 



i=o ^• 

{m — 1 — i)\ {n — 1 — i)\ 

Collecting the factors which do not depend on /i, the integral can be written in terms 
of the gamma function: d/i ^"^+n-2-i^-f^{i+p) ^ Y{m + n-l-i) /(I and 
one obtains the original form of Cox probabilities: 

„(.) ^ --("X^'"-^) (m + n-2-.)! 

m,n / J 



j=o i!(m — 1 — — 1 — %)\ 

i m+n—2~2i 



(1 

with m = 1, 2, . . . , oo and n = 1,2, ... , oo. Letting i vary from 1 to min(m, n) and 
rearranging the terms, the summed expression may be identified with the multinomial 
distribution. 

p{d) min{m,n) / 1 n n \ 

— = > mult U — 1, m — n — «: ,- ,- (3) 

P^"} ti \ ' l+pd' l+Pd l+Pd) ^ ' 

where Pi'^i is the couple density (mutually nearest neighbors) and mult(na, ni,,nc;Pa,Pb, Pc 
is the multinomial distribution (Eq. IT^ . 

Notice that the Cox probability distribution is not a joint distribution. The 
summation Y.m,n Pm^n diverges since for each neighborhood degree m it must be 
normalized X^n Pm\ = 1 ^'^d one obtains the mean (ji) = m + pd and the variance 
(n^) — (n)^ = (2m +Pd — ^)Pd- The system dimensionality d is the bare parameter that 
emerges from the medium while the considered neighborhood order m is fixed according 
to the convenience. 

For future reference, let us rewrite Eq. |21 close to its original form: 

1 / \ m+n 

pid) ^ _±_ f ^d_\ 

"^''^ 1-Pd[l+Pd) 

[{l-pl)/plr\m + n-^-2)\ 
^ (m - i - l)\{n - i - l)\i\ 

_ {p-/ + !)-('"+") 
1 -Pd 

{pf~lyB{m + n-2^,2) 
^ B{m + n-2i,i)B{m-i + l,n-i + l) ' ^' 



Numerical values of Eqs. ^ and El are shown in Table ^ 
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3. Random Link Model and High Dimensionality Probabilities 

The high dimensionality can be obtained directly from Cox probabilities. It is interesting 
to point out the emergence of a characteristic dimensionality do = 8 from the 
calculation. In this procedure, one can easily obtain the first order correction from the 
random link model neighborhood probabilities. Next we recall that we are considering 
the thermodynamic limit and give a geometrical interpretation for the random link 
expression, which corresponds to all the points being on the surface of the volume Vd- 
In the following we correct the random link model neighborhood probabilities to finite 
systems. 




3.1. Thermodynamic Limit 

Let us consider the high dimensionality situation {d ^ 1). This corresponds to take 
b = {d + l)/2 a = 1/2 in Eq. [U Since b ^ a, the approximation B{a,b) ^ T{a)/b"' 
can be used for /^(a, b) ^ b''/T{a) Jq dt - ^)^ Once t < z = l/A implies t < 1, 

the approximation (1 — t)^ = e*^°*^^~*^ ~ e~^* yields Iz{a,b) ~ 'j{a,bz)/T(a), where 
7 (a, 6) = /o^dt r-^e"* is the non-normalized incomplete gama function ^2]; which 
presents the following property 7(1/2, x) = 2 Jq^ dt e~*^ = ^y7TeT^{^/x) with the error 
function [22] definided by: erf(z) = Jq dt e^*^ which monotoly increases from 

erf(O) = to erf(cxD) = 1. Since a = 1/2, the following property [22] can be used: 
Iz{a,b) ~ 7(1/2, 62;)/r(l/2) = eTf{Vbz) and Eq. [T]can be re-written as: 

(6) 

where a characteristic dimensionality do = 8 naturally emerges from the analysis. 

The complementar error function is defined by erfc(z) = {2/y/n)J^dt e~*^ = 
1 — erf(2;). For \z\ ^ 1, its Taylor series is useful: erfc(2;) = e~^'^ /{zy/n){l — z^/2 + ■■■). 
A further approximation can be performed noticing that erfc(z) = 1 — erf (2;), so that 
for 1 2; I 3> 1, it can be written as [22]: 



with 



Using: 1 — p^ = a^, p^^ + 1 = 2 and p^^ — 1 = 2ad in Eq. Cox probabilities are 
written as a power series in for high dimensional systems: 

02— (m+n) 

p(rf»i) ^ p{ri) , f + ■ ■ ■ (9) 

B[m — 1, n — 1) 




Pd^l- erfc \ \ - \ I - ad , (7) 



«^ = 4^ + • (8) 



vr .Id/R \ d 
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where in the random hnk approximation {d 



p{rl) 

m,n 

1,1 



-'1,1 



1 Tim + n — 11 



2m+n Y{m)T{n) 



oo) this probabihty is: 

(m + n — l)B(m, n) 



bin 
1 

2 ' 



1, n 



1 1 

"'2' 2 



(10) 
(11) 



where is the couple density and bin(na, n^; is the binomial distribution given 

by Eq. [T7| Simple expressions can be obtained such as: Pi^^ = 1/2", P2,n = 

In the high dimensionality limit ^ oo, the relative volume of the crescent (Eq. 
tends to 1 {pd 1) and the expected number of points /i(l —pd) inside the intersection 
vanishes, This is easily seen if one considers a hypersphere of radius r inside in a 
hypercube of edge 2r, as the dimensionality increases the hypersphere volume decreases 
relatively to the hypercube and difference of volumes increases meaning that all the 
points lie on the external volume to the hypersphere [SBl- 



Since limp^^i[(l - Pd)/(1 + 



5,, 



j,05 



where is the Kronecker delta. 



the 



multinomial distribution in Eq. 01 becomes the binomial distribution of Eq. IH 

The numerical values relative to the high dimensionality cases are shown in Tabled 



d 


Pd 


-'1,1 


-'1,2 


pW 

-'2,2 





1/3 


3/4 


3/16 


15/32 


1 


1/2 


2/3 


2/9 


10/27 


2 


27r+3V3 




67r(27r+3v^) 


67r(407r2+12V37r+27) 






87r+3v^ 


(877 + 3^3)2 


(877+3^3)3 


3 


11/16 


16/27 


176/729 


6032/19683 








> 1 


I- ad 


iX+Pd)-' 


PdiX+Pd) ^ 


(i + p1)(i + p,)-3 


oo (rl) 


1 


1/2 


1/4 


1/4 



Table 1. Some values of neighborhood probabihty. For low dimensionalities, one uses 
Eq.|21 An interesting hmiting case is d = 0, which yields: = f^^^ di/[7r^f(l — t)] = 
1/3. For \, one uses Eq. Eland for the random link model d ^ oo, one uses Eq.EB 



3.2. Finite Size System 

The RPP high dimensional limit d ^ oo corresponds to the RLM, where all distances 
become i.i.d. random variables. Since Euclidean distances are only a means to obtain 
the ranking neighborhood probabilities, it is independent of particular choice for the 
distance probability distribution function (pdf) JH]- For simplicity, we will consider 
uniform deviates in the interval [0, 1] for the distances among the points. 

As before, let / be the mth nearest neighbor of J and J be the nth nearest neighbor 
of /. Thus, the following conditions hold: 
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(i) the distance x from I to J may assume any value in the interval [0, 1], 

(ii) the distances from J to each of its m — 1 nearest neighbors must be less than x and 

(iii) the distances from J to each of its — m — 1 farthest neighbors must be greater 
than X, as well as 

(iv) the distances from I to each of its n — 1 nearest neighbors must be less than x and 

(v) the distances from / to each of its — n — 1 farthest neighbors must be greater 
than X. 

Figure 121 illustrates the situation. 

N — n — 1 points N — m — 1 points 
farther than J farther than / 













1/ . \ 




> 













n — 1 points m — 1 points 
nearer than J nearer than / 

Figure 2. Schematic illustration of the points / and J and their neighbors in a A^-point 
random link model. 



It also must be noticed that: 

(i) choosing an arbitrary point /, its mth nearest neighbor is automatically set, and 
there is — 1 possibilities for this, 

(ii) it must be counted all possible combinations in distributing the N — 2 neighbors of 
J between the m — 1 nearest and the N — m — 1 farthest than J, 

(iii) the same counting must be done for the N — 2 neighbors of /. 

Combining these three countings and those five distance restrictions, one has: 

{N - 1)[{N - 2)\]^ 



p(rl,N) 
m,n 



(m - 1)!(A^ - m - l)\{n - iy.{N - n - 1)1 



dx 



dy 



\2N-m-n-2 



m+n—2 



I -.2N-m-n-2 

dy 



Since: /q dx _ ^)2w-m-n-2 ^ ^ n - l,2N - m 

2)\{2N -m-n- 2)\/[{2N - 3)(2iV - 4)!] then: 

p{rl,N) 

-hypg(iV-2,A^-2;m-l,n-l) 



n 



-'1,1 

p(ri,7V) 
-^1,1 



N 



with m 



1,2,3,. ..,iV 



2N-3' 
— 1 and n 



[m + n — 

(12) 
(13) 



1,2,3,. ..,iV 



1. 
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Here one sees the emergence of hypergeometric distribution (Eq. IT^ and of the 
couple density Pi^l'^\ These equations (Eqs. IT^ and IT^ reduce to Eqs. ITUl and ITT] as 
Ar> 1. 

Fig. 13 shows Pl^^f^ function of n in a 10-point RLM. Notice that each curve 
reaches its maximum at the reflexive case m = n and that they are symmetric with 
respect to N/2. 




n 

Figure 3. Neighborhood probabihties in a 10-point RLM. The distributions are 
discrete and the lines are only a guide to the eyes. 



4. Random Map Model 

Breaking the distance symmetry constraint Dij = Dj^i in the RLM leads to the RMM. 
The RMM is the mean field approximation to several problems and analytical results 
may be obtained. Also, Cox probabilities can be obtained for the RMM. 

In the case which the constraint Di^i = 0,Vi is preserved, if an arbitrary point 
I is chosen, its mth neighbor J is automatically set, but the nth neighbor of J is 
equally probable to be anyone of the other — 1 points, since the distances are totally 
independent. Thus, the probability -P^™^ that the point / is the nth neighbor of its 
mth neighbor is simply: 

P(™) = I (]A) 

where m = 1, 2, . . . , — 1 and n = 1, 2, . . . , A^ — 1. 

On the other hand, in the case which Di i 7^ is allowed, the probability -P^™-* is 
twice as large for reflexive neighbors than for non-reflexive ones, because now one must 
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consider that every point is always its own mth nearest neighbor, for some m. 



where Sm,n is the Kronecker deha, m = 1,2, . . . , N and n = 1, 2, . . . , A^. 

Notice that in the thermodynamic hmit ^ 1, these cases are still distinguable 
due to the presence of the factor 2 for the reflexive neighbors. 

5. Conclusion 

Using only Poisson distribntion, Cox probabilities have been obtained through a simple 
derivation and they have been identified with the multinomial distribution. Writing the 
dimensionality parameter pa in terms of the normalized incomplete beta function allowed 
us to obtain the high dimensional approximation for the neighborhood probabilities in 
Poissonic processes (RPP, for instance) and a characteristic dimensionality do = 8 has 
arisen naturally. 

Using the same line of reasoning, the neighborhood probabilities have been obtained 
for RLM finite size systems. In this case the probabihties have been identified with the 
hypergeometric distribution. Also, simple expressions have been obtained for the RMM. 

Up to now, we are devoting efforts to try to obtain the neighborhood probabilities 
for finite-size and low-dimensionality systems. 
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Appendix: Some Discrete Statistical Distributions 

In the following we briefly review some discrete distributions used here. 

1.1. Infinite Population 

Let us first consider infinite populations. 

1.1.1. Multinomial Distribution Consider an infinite population whose objects can be 
classified according to m distinct types, which occur with probabilities Pi,P2, ■ ■ ■ ,Pm, 
such that J2^iPi — 1- The probability that a uniform random sample has ni objects of 
type 1, n2 objects of type 2 and so on is given by the multinomial distribution: 



m,n 




(15) 



N + 1 



mult(ni,n2, . . • ,n^;pi,P2, • • • ,Pm) 



n\pTpT---P\ 



(16) 



where n — n\ -\- n2 -\- ■ — h Um is the sample size. 
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1.1.2. Binomial Distribution If the infinite population has only two distinct types of 
objects (usually referred as success and failure), the multinomial distribution becomes 
the binomial distribution: 

bin(ni,n2;pi,p2) = ^^^^^ = ( ) K^l -Pi)" , (17) 

ni!n2! \ / 

where n — rii + n2 and p2 — 1 — Pi- 

1.1.3. Poisson Distribution If n — oo and pi ^ such that the average /j, = npi 
remains finite, the binomial distribution becomes the Poisson distribution: 

pois(ni) = ^^-^ , (18) 
ni! 

where the only distribution parameter is the average = (ni). 
1.2. Finite Population 

Consider a finite population with N objects, such as A^i objects are of type 1 and the 
reminding N2 = N — Ni objects are of type 2. The probability that a uniform random 
sample, drawn without reposition, has rii objects of type 1, objects of type 2 and so 
on is given by the hypergeometric distribution: 



hypg(A^i,A^2;ni,n2) = 




where n — ui + n2 is the sample size. In the limit A^i — > 00 and N2 ^ 00, with the 
probabilities pi — Ni/N and p2 — N2/N kept fixed, the hypergeometric distribution 
becomes the binomial one. 
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